-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[Clang] Allow Finding across only parts of an AST. #127423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
This is relevant for clang modules, as they are imported into the AST, but are actually part of a different TU. It can result in hundreds of milliseconds of additional time to also traverse the AST of these modules, and often for no benefit, as they are frequently already traversed in their own TU.
fb532e7 to
b3d8ac7
Compare
|
@llvm/pr-subscribers-clang-modules @llvm/pr-subscribers-clang Author: Matt (matts1) ChangesThis is relevant for clang modules, as they are imported into the AST, but are actually part of a different TU. It can result in hundreds of milliseconds of additional time to also traverse the AST of these modules, and often for no benefit, as they are frequently already traversed in their own TU. Without this PR, our raw pointer plugin to detect unsafe uses of raw pointers can add up to 1 second of overhead to check a file that only contains the line See https://issues.chromium.org/issues/351909443 for details and benchmarks. Full diff: https://github.com/llvm/llvm-project/pull/127423.diff 4 Files Affected:
diff --git a/clang/include/clang/ASTMatchers/ASTMatchFinder.h b/clang/include/clang/ASTMatchers/ASTMatchFinder.h
index a387d9037b7da..2b161a574d5b6 100644
--- a/clang/include/clang/ASTMatchers/ASTMatchFinder.h
+++ b/clang/include/clang/ASTMatchers/ASTMatchFinder.h
@@ -139,6 +139,13 @@ class MatchFinder {
///
/// It prints a report after match.
std::optional<Profiling> CheckProfiling;
+
+ /// Whether to traverse a Decl. This is relevant for clang modules, as they
+ /// are imported into the AST, but are actually part of a different TU.
+ /// It can result in hundreds of milliseconds of additional time to also
+ /// traverse the AST of these modules, and often for no benefit, as they
+ /// are frequently already traversed in their own TU.
+ std::optional<llvm::function_ref<bool(const Decl &)>> ShouldTraverseDecl;
};
MatchFinder(MatchFinderOptions Options = MatchFinderOptions());
diff --git a/clang/lib/ASTMatchers/ASTMatchFinder.cpp b/clang/lib/ASTMatchers/ASTMatchFinder.cpp
index 3d01a70395a9b..5d2f2065ceba1 100644
--- a/clang/lib/ASTMatchers/ASTMatchFinder.cpp
+++ b/clang/lib/ASTMatchers/ASTMatchFinder.cpp
@@ -1443,7 +1443,8 @@ bool MatchASTVisitor::objcClassIsDerivedFrom(
}
bool MatchASTVisitor::TraverseDecl(Decl *DeclNode) {
- if (!DeclNode) {
+ if (!DeclNode || (Options.ShouldTraverseDecl &&
+ !(*Options.ShouldTraverseDecl)(*DeclNode))) {
return true;
}
diff --git a/clang/unittests/ASTMatchers/ASTMatchersTest.h b/clang/unittests/ASTMatchers/ASTMatchersTest.h
index ad2f5f355621c..02bdcc3a3ab1f 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersTest.h
+++ b/clang/unittests/ASTMatchers/ASTMatchersTest.h
@@ -59,6 +59,11 @@ class VerifyMatch : public MatchFinder::MatchCallback {
const std::unique_ptr<BoundNodesCallback> FindResultReviewer;
};
+inline ArrayRef<TestLanguage> langCxx11() {
+ static const TestLanguage Result[] = {Lang_CXX11};
+ return ArrayRef<TestLanguage>(Result);
+}
+
inline ArrayRef<TestLanguage> langCxx11OrLater() {
static const TestLanguage Result[] = {Lang_CXX11, Lang_CXX14, Lang_CXX17,
Lang_CXX20, Lang_CXX23};
@@ -91,9 +96,11 @@ testing::AssertionResult matchesConditionally(
const Twine &Code, const T &AMatcher, bool ExpectMatch,
ArrayRef<std::string> CompileArgs,
const FileContentMappings &VirtualMappedFiles = FileContentMappings(),
- StringRef Filename = "input.cc") {
+ StringRef Filename = "input.cc",
+ MatchFinder::MatchFinderOptions Options =
+ MatchFinder::MatchFinderOptions()) {
bool Found = false, DynamicFound = false;
- MatchFinder Finder;
+ MatchFinder Finder(Options);
VerifyMatch VerifyFound(nullptr, &Found);
Finder.addMatcher(AMatcher, &VerifyFound);
VerifyMatch VerifyDynamicFound(nullptr, &DynamicFound);
@@ -147,11 +154,13 @@ testing::AssertionResult matchesConditionally(
template <typename T>
testing::AssertionResult
matchesConditionally(const Twine &Code, const T &AMatcher, bool ExpectMatch,
- ArrayRef<TestLanguage> TestLanguages) {
+ ArrayRef<TestLanguage> TestLanguages,
+ MatchFinder::MatchFinderOptions Options =
+ MatchFinder::MatchFinderOptions()) {
for (auto Lang : TestLanguages) {
auto Result = matchesConditionally(
Code, AMatcher, ExpectMatch, getCommandLineArgsForTesting(Lang),
- FileContentMappings(), getFilenameForTesting(Lang));
+ FileContentMappings(), getFilenameForTesting(Lang), Options);
if (!Result)
return Result;
}
@@ -162,8 +171,10 @@ matchesConditionally(const Twine &Code, const T &AMatcher, bool ExpectMatch,
template <typename T>
testing::AssertionResult
matches(const Twine &Code, const T &AMatcher,
- ArrayRef<TestLanguage> TestLanguages = {Lang_CXX11}) {
- return matchesConditionally(Code, AMatcher, true, TestLanguages);
+ ArrayRef<TestLanguage> TestLanguages = {Lang_CXX11},
+ MatchFinder::MatchFinderOptions Options =
+ MatchFinder::MatchFinderOptions()) {
+ return matchesConditionally(Code, AMatcher, true, TestLanguages, Options);
}
template <typename T>
diff --git a/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp b/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
index 068cf66771027..02badc50241d2 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
+++ b/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
@@ -28,6 +28,18 @@ TEST(DeclarationMatcher, hasMethod) {
cxxRecordDecl(hasMethod(isPublic()))));
}
+TEST(DeclarationMatcher, shouldTraverse) {
+ MatchFinder::MatchFinderOptions Options;
+ Options.ShouldTraverseDecl = [](const Decl &decl) { return true; };
+ EXPECT_TRUE(matches("class A { void func(); };",
+ cxxRecordDecl(hasMethod(hasName("func"))), langCxx11(),
+ Options));
+ Options.ShouldTraverseDecl = [](const Decl &decl) { return false; };
+ EXPECT_FALSE(matches("class A { void func(); };",
+ cxxRecordDecl(hasMethod(hasName("func"))), langCxx11(),
+ Options));
+}
+
TEST(DeclarationMatcher, ClassDerivedFromDependentTemplateSpecialization) {
EXPECT_TRUE(matches(
"template <typename T> struct A {"
|
I'm not a modules or matcher expert, but this doesn't match my (perhaps uninformed) understanding. I thought that when we import a module, it is part of our TU, similarly to when we include a header. I thought what we wanted to do was avoid walking the parts of the AST that come from a module but aren't used in the TU, and therefore haven't been deserialized. |
You are correct. I phrased it very poorly. I'll reclarify - if we have a header file we depend on, then if the header file came from a different module, it would have already been checked when we compiled that module, so there's no need to recheck it.
Consider: // module.cpp
#include "module.h"// lib.cpp
#include "lib.h"
#include "module.h"To compile clang++ -emit-module -o module.pcm -add-plugin raw-ptr-plugin module.cpp
clang++ -emit-module -o lib.pcm -add-plugin raw-ptr-plugin lib.cpp -fmodule-file module.pcmThe raw plugin will check |
|
Is there a particular reason to have this be a generic predicate rather than just a bool of if other modules should be visited? I can see how you could do other things with it like skipping a namespace or something, but I'm not sure if the other use cases would actually see use. If there is a reason for a generic predicate, then I think this implementation is fine. |
|
I think other use cases could see use. For example, you could write a predicate to check whether it's in the std namespace to avoid checking the standard library, or a predicate to check whether a class is a subclass of I don't personally need the generic predicate though, so if you'd prefer I can do a bool. |
Bigcheese
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good from the modules end and I get how the predicate can be useful. My only concern is that this feels a bit like another matcher that's not using the matcher API, so I'd like someone more familiar with it to also approve, maybe @5chmidti or @HighCommander4.
|
We've encountered a similar use case in clangd -- for fast incremental reparses, we split the file into a "preamble" which we build as a PCH, and a "main file" which consumes the PCH, and then we want to run e.g. clang-tidy checkers only on the "main file" part of the AST -- and we address it by using Perhaps this approach could be used for modules as well? |
|
What is the current status of this PR? @matts1 is it possible to apply the idea from @HighCommander4 ? |
|
I've currently put it on hold since I'm prioritising allowing config mismatches |
|
Can confirm that this approach appears to work. |
This is relevant for clang modules, as they are imported into the AST, but are actually part of a different TU. It can result in hundreds of milliseconds of additional time to also traverse the AST of these modules, and often for no benefit, as they are frequently already traversed in their own TU.
Without this PR, our raw pointer plugin to detect unsafe uses of raw pointers can add up to 1 second of overhead to check a file that only contains the line
#include <cstddef>. With this PR, we can use the same logic as in #127161 to reduce this overhead to <10ms, a 100x speedup.See https://issues.chromium.org/issues/351909443 for details and benchmarks.